21 research outputs found
Multilingual word embeddings and their utility in cross-lingual learning
Word embeddings - dense vector representations of a word’s distributional semantics - are an indespensable component of contemporary natural language processing (NLP). Bilingual embeddings, in particular, have attracted much attention in recent years, given their inherent applicability to cross-lingual NLP tasks, such as Part-of-speech tagging and dependency parsing. However, despite recent advancements in bilingual embedding mapping, very little research has been dedicated to aligning embeddings multilingually, where word embeddings for a variable amount of languages are oriented to a single vector space. Given a proper alignment, one potential use case for multilingual embeddings is cross-lingual transfer learning, where a machine learning model trained on resource-rich languages (e.g. Finnish and Estonian) can “transfer” its salient features to a related language for which annotated resources are scarce (e.g. North Sami). The effect of the quality of this alignment on downstream cross-lingual NLP tasks has also been left largely unexplored, however.
With this in mind, our work is motivated by two goals. First, we aim to leverage existing supervised and unsupervised methods in bilingual embedding mapping towards inducing high quality multilingual embeddings. To this end, we propose three algorithms (one supervised, two unsupervised) and evaluate them against a completely supervised bilingual system and a commonly employed baseline approach. Second, we investigate the utility of multilingual embeddings in two common cross-lingual transfer learning scenarios: POS-tagging and dependency parsing. To do so, we train a joint POS-tagger/dependency parser on Universal Dependencies treebanks for a variety of Indo-European languages and evaluate it on other, closely related languages. Although we ultimately observe that, in most settings, multilingual word embeddings themselves do not induce a cross-lingual signal, our experimental framework and results offer many insights for future cross-lingual learning experiments
Multilingual word embeddings and their utility in cross-lingual learning
Word embeddings - dense vector representations of a word’s distributional semantics - are an indespensable component of contemporary natural language processing (NLP). Bilingual embeddings, in particular, have attracted much attention in recent years, given their inherent applicability to cross-lingual NLP tasks, such as Part-of-speech tagging and dependency parsing. However, despite recent advancements in bilingual embedding mapping, very little research has been dedicated to aligning embeddings multilingually, where word embeddings for a variable amount of languages are oriented to a single vector space. Given a proper alignment, one potential use case for multilingual embeddings is cross-lingual transfer learning, where a machine learning model trained on resource-rich languages (e.g. Finnish and Estonian) can “transfer” its salient features to a related language for which annotated resources are scarce (e.g. North Sami). The effect of the quality of this alignment on downstream cross-lingual NLP tasks has also been left largely unexplored, however.
With this in mind, our work is motivated by two goals. First, we aim to leverage existing supervised and unsupervised methods in bilingual embedding mapping towards inducing high quality multilingual embeddings. To this end, we propose three algorithms (one supervised, two unsupervised) and evaluate them against a completely supervised bilingual system and a commonly employed baseline approach. Second, we investigate the utility of multilingual embeddings in two common cross-lingual transfer learning scenarios: POS-tagging and dependency parsing. To do so, we train a joint POS-tagger/dependency parser on Universal Dependencies treebanks for a variety of Indo-European languages and evaluate it on other, closely related languages. Although we ultimately observe that, in most settings, multilingual word embeddings themselves do not induce a cross-lingual signal, our experimental framework and results offer many insights for future cross-lingual learning experiments
What can we learn from Semantic Tagging?
We investigate the effects of multi-task learning using the recently
introduced task of semantic tagging. We employ semantic tagging as an auxiliary
task for three different NLP tasks: part-of-speech tagging, Universal
Dependency parsing, and Natural Language Inference. We compare full neural
network sharing, partial neural network sharing, and what we term the learning
what to share setting where negative transfer between tasks is less likely. Our
findings show considerable improvements for all tasks, particularly in the
learning what to share setting, which shows consistent gains across all tasks.Comment: 9 pages with references and appendixes. EMNLP 2018 camera read
Attention Can Reflect Syntactic Structure (If You Let It)
Since the popularization of the Transformer as a general-purpose feature
encoder for NLP, many studies have attempted to decode linguistic structure
from its novel multi-head attention mechanism. However, much of such work
focused almost exclusively on English -- a language with rigid word order and a
lack of inflectional morphology. In this study, we present decoding experiments
for multilingual BERT across 18 languages in order to test the generalizability
of the claim that dependency syntax is reflected in attention patterns. We show
that full trees can be decoded above baseline accuracy from single attention
heads, and that individual relations are often tracked by the same heads across
languages. Furthermore, in an attempt to address recent debates about the
status of attention as an explanatory mechanism, we experiment with fine-tuning
mBERT on a supervised parsing objective while freezing different series of
parameters. Interestingly, in steering the objective to learn explicit
linguistic structure, we find much of the same structure represented in the
resulting attention patterns, with interesting differences with respect to
which parameters are frozen
Higher-order Comparisons of Sentence Encoder Representations
Representational Similarity Analysis (RSA) is a technique developed by
neuroscientists for comparing activity patterns of different measurement
modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has
several advantages over existing approaches to interpretation of language
encoders based on probing or diagnostic classification: namely, it does not
require large training samples, is not prone to overfitting, and it enables a
more transparent comparison between the representational geometries of
different models and modalities. We demonstrate the utility of RSA by
establishing a previously unknown correspondence between widely-employed
pretrained language encoders and human processing difficulty via eye-tracking
data, showcasing its potential in the interpretability toolbox for neural
modelsComment: EMNLP 201
The Search for Syntax : Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing
Syntax — the study of the hierarchical structure of language — has long featured as a prominent research topic in the field of natural language processing (NLP). Traditionally, its role in NLP was confined towards developing parsers: supervised algorithms tasked with predicting the structure of utterances (often for use in downstream applications). More recently, however, syntax (and syntactic theory) has factored much less into the development of NLP models, and much more into their analysis. This has been particularly true with the nascent relevance of language models: semi-supervised algorithms trained to predict (or infill) strings given a provided context. In this dissertation, I describe four separate studies that seek to explore the interplay between syntactic parsers and language models upon the backdrop of dependency syntax. In the first study, I investigate the error profiles of neural transition-based and graph-based dependency parsers, showing that they are effectively homogenized when leveraging representations from pre-trained language models. Following this, I report the results of two additional studies which show that dependency tree structure can be partially decoded from the internal components of neural language models — specifically, hidden state representations and self-attention distributions. I then expand on these findings by exploring a set of additional results, which serve to highlight the influence of experimental factors, such as the choice of annotation framework or learning objective, in decoding syntactic structure from model components. In the final study, I describe efforts to quantify the overall learnability of a large set of multilingual dependency treebanks — the data upon which the previous experiments were based — and how it may be affected by factors such as annotation quality or tokenization decisions. Finally, I conclude the thesis with a conceptual analysis that relates the aforementioned studies to a broader body of work concerning the syntactic knowledge of language models